Goto

Collaborating Authors

 communication reduction


CoPriv: Network/ProtocolCo-Optimizationfor Communication-EfficientPrivateInference

Neural Information Processing Systems

Wealso compare CoPrivwith SOTA network optimization methods, including SNL, MetaPruning, etc. CoPriv achieves 9.98 and 3.88 online and total communication reduction with a higher accuracy compared to SNL,respectively.




CoPriv: Network/Protocol Co-Optimization for Communication-Efficient Private Inference

Neural Information Processing Systems

Deep neural network (DNN) inference based on secure 2-party computation (2PC) can offer cryptographically-secure privacy protection but suffers from orders of magnitude latency overhead due to enormous communication. Previous works heavily rely on a proxy metric of ReLU counts to approximate the communication overhead and focus on reducing the ReLUs to improve the communication efficiency. However, we observe these works achieve limited communication reduction for state-of-the-art (SOTA) 2PC protocols due to the ignorance of other linear and non-linear operations, which now contribute to the majority of communication.


LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

Tianyi Chen, Georgios Giannakis, Tao Sun, Wotao Yin

Neural Information Processing Systems

This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients. The resultant gradient-based algorithms are termed Lazily A ggregated G radient -- justifying our acronym LAG used henceforth. Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex cases; and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of lagged gradients. Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.


MedHE: Communication-Efficient Privacy-Preserving Federated Learning with Adaptive Gradient Sparsification for Healthcare

Yesmin, Farjana

arXiv.org Artificial Intelligence

Healthcare federated learning requires strong privacy guarantees while maintaining computational efficiency across resource-constrained medical institutions. This paper presents MedHE, a novel framework combining adaptive gradient sparsification with CKKS homomorphic encryption to enable privacy-preserving collaborative learning on sensitive medical data. Our approach introduces a dynamic threshold mechanism with error compensation for top-k gradient selection, achieving 97.5 percent communication reduction while preserving model utility. We provide formal security analysis under Ring Learning with Errors assumptions and demonstrate differential privacy guarantees with epsilon less than or equal to 1.0. Statistical testing across 5 independent trials shows MedHE achieves 89.5 percent plus or minus 0.8 percent accuracy, maintaining comparable performance to standard federated learning (p=0.32) while reducing communication from 1277 MB to 32 MB per training round. Comprehensive evaluation demonstrates practical feasibility for real-world medical deployments with HIPAA compliance and scalability to 100 plus institutions.



We thank the reviewers for their detailed and constructive comments, especially during these unprecedented times

Neural Information Processing Systems

We thank the reviewers for their detailed and constructive comments, especially during these unprecedented times. Our algorithm isn't designed to compete (or However, in our new experiment in Fig. B we achieve close to D-SGD We will add to the paper an experiment with 4 different models. Reference data can be synthetic and then it is easy to obtain (as in co-regularization, see R1's comment). We now explain that in detail. The graphs in this work were randomly drawn for a given maximum number of degrees per node.



DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models

Iacob, Alex, Sani, Lorenzo, Safaryan, Mher, Giampouras, Paris, Horváth, Samuel, Jovanovic, Andrej, Kurmanji, Meghdad, Aleksandrov, Preslav, Shen, William F., Qiu, Xinchi, Lane, Nicholas D.

arXiv.org Artificial Intelligence

Scaling foundation model training with Distributed Data Parallel (DDP) methods is bandwidth-limited. Existing infrequent communication methods like Local SGD were designed to synchronize only model parameters and cannot be trivially applied to adaptive optimizers due to additional optimizer states. Current approaches extending Local SGD either lack convergence guarantees or require synchronizing all optimizer states, tripling communication costs. We propose Desynced Low Communication Adaptive Optimizers (DES-LOC), a family of optimizers assigning independent synchronization periods to parameters and momenta, enabling lower communication costs while preserving convergence. Through extensive experiments on language models of up to 1.7B, we show that DES-LOC can communicate 170x less than DDP and 2x less than the previous state-of-the-art Local ADAM. Furthermore, unlike previous heuristic approaches, DES-LOC is suited for practical training scenarios prone to system failures. DES-LOC offers a scalable, bandwidth-efficient, and fault-tolerant solution for foundation model training.